MF3-H02: lock home channel before reading status in event handlers#820
Conversation
…lers MF3-H02: home-channel event handlers read the channel via GetChannelByID before acquiring LockUserState, then acted on that pre-lock snapshot. LockUserState locks the user balance row, not the channel row, so a concurrent submit_state Finalize could flip the channel Open->Closing in the read->lock window. HandleHomeChannelCheckpointed's non-challenged path persists the snapshot verbatim, so the stale Open clobbered the committed Closing and reopened a finalized channel. Add DBStore.LockUserStateForHomeChannel: it derives the (wallet, asset) lock key from the channel inside SQL and, on Postgres, locks the balance row and reads the channel in one statement (SELECT ... JOIN ... FOR UPDATE OF b), so no stale pre-lock snapshot can exist. All four home-channel handlers (Created, Checkpointed, Challenged, Closed) now lock-then-read via this method. Escrow handlers are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ihsraham
left a comment
There was a problem hiding this comment.
Thanks, the handler-side change is moving in the right direction: the home-channel event handlers now go through one helper before mutating status, and the production reactor path is transaction-wrapped.
I am requesting changes on one blocker because the Postgres helper still does not guarantee a post-lock channel refresh, so H-02 is not fully closed yet. I left the specific note inline.
I am not treating the current SDK/MCP audit failure as part of this review, since that dependency fix is being handled separately and should be pulled in before merge.
LockUserStateForHomeChannel previously locked the balance row and read the channel in a single `SELECT c.* ... FOR UPDATE OF b`. In READ COMMITTED, `FOR UPDATE OF b` only re-evaluates the locked user_balances row after the lock wait; the joined channels row is served from the statement-start snapshot. A checkpoint handler could begin with status=Open, block on the balance lock while a concurrent Finalize committed status=Closing, then return the stale Open channel and persist it. Acquire the (wallet, asset) balance lock first, then read the channel in a separate statement so it takes a fresh snapshot reflecting any transition committed during the wait. Add a Postgres concurrency regression that holds the balance lock, flips status to Closing, and commits while the call is blocked; it asserts the returned status is Closing (skips on non-postgres). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ihsraham
left a comment
There was a problem hiding this comment.
Approving for H-02 closure. The new commit fixes the stale snapshot issue by acquiring the (wallet, asset) balance lock first and reading the channel in a separate statement after the lock is held. The added Postgres regression covers the wait-and-refresh case Anton described.
Non-blocking follow-up: that regression only runs with TEST_DB_DRIVER=postgres, so a dedicated pg CI lane would make this easier to keep covered. I am not treating that as a blocker for this fix.
| // Acquire the user's balance-row lock and read the channel under it before mutating | ||
| // channel status or backfilling the off-chain head. Receiver/release issuance paths | ||
| // lock the same row up front and then re-check Status via CheckActiveChannel; without | ||
| // this lock an in-flight RPC can read Status=Void, decide to store an unsigned receiver | ||
| // row, and commit before we flip to Open — leaving the latest head unsigned on a | ||
| // technically opened channel. The lock+read is a single store call so the channel | ||
| // snapshot is consistent with the lock. See HandleHomeChannelCheckpointed and | ||
| // HandleHomeChannelClosed for the same pattern. |
There was a problem hiding this comment.
I think this comment needs shortening. Specifying that "user's balance-row lock guards against race-conditions of channel status being changed in the meantime" should be enough.
There was a problem hiding this comment.
Yep, trimmed it down in 7f6d275 — now just says the lock guards against a concurrent submit_state flipping status between the read and our write, plus the see-also pointer.
nksazonov
left a comment
There was a problem hiding this comment.
Clean implementation of the lock-before-read pattern across all four home-channel event handlers. The new regression test TestHandleHomeChannelCheckpointed_DoesNotReopenFinalizedChannel directly pins the race described in the audit finding.
- db_store: document READ COMMITTED isolation prerequisite for the post-lock channel refresh (negated under REPEATABLE READ/SERIALIZABLE) - core/store interface: narrow LockUserStateForHomeChannel doc — channel is read under the lock on postgres; pre-lock snapshot on sqlite (tests) - channel_test: replace fixed time.Sleep with a deterministic pg_locks poll so the wait-and-refresh race is actually contended, not a no-op - service: shorten HandleHomeChannelCreated lock comment Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Finding
Home-channel event handlers read the channel via
GetChannelByIDbefore acquiringLockUserState, then mutated/persisted that pre-lock snapshot.LockUserStatelocks the user balance row, not the channel row — so a concurrentsubmit_stateFinalize could flip the channelOpen→Closingin the read→lock window.HandleHomeChannelCheckpointed's non-challenged path persists the channel snapshot verbatim (it only reassignsStatuson theChallengedbranch). A staleOpensnapshot therefore overwrote the committedClosing, reopening a finalized channel — bypassing the C-01 remediation and enabling the post-finalize epoch-rebind / fake-HomeDepositspend chain.Fix
New
DBStore.LockUserStateForHomeChannel(channelID):(wallet, asset)lock key from the channel inside SQL — no pre-lock Go snapshot exists.SELECT c.* FROM channels c JOIN user_balances b ... WHERE c.channel_id = ? FOR UPDATE OF b— locks the balance row (same row/strength asLockUserState) and reads the channel in one statement. The stale-snapshot window is structurally impossible, not just guarded.LockUserState→ read.All four home-channel handlers (
Created,Checkpointed,Challenged,Closed) now lock-then-read via this method. Escrow handlers are unchanged (different status surface, out of scope).Wired into
core.ChannelHubEventHandlerStore,evm.ChannelHubReactorStore,database.DatabaseStore+ both mocks.Tests
TestHandleHomeChannelCheckpointed_DoesNotReopenFinalizedChannel— post-lockClosingsnapshot must persistClosing, never reopen toOpen.TestDBStore_LockUserStateForHomeChannel— returns channel + ensures balance row; nil for missing channel.go build ./...,go vet, andgo test(event_handlers, evm, store/database) pass.🤖 Generated with Claude Code